A community index of third-party packages for Apache Spark.

Showing packages 51 - 100 out of 105 for search "tags:"Machine Learning""

Adaptation of the CluStream method in Spark

@obackhoff / Latest release: 0.6.5 (2016-03-31) / Apache-2.0 / (1)

  • 1|clustering
  • 1|streaming
  • 1|machine learning


MLeap allows for easily putting Spark ML pipelines into production

@TrueCar / Latest release: 0.1.5 (2016-06-06) / Apache-2.0 / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Parallelized Stochastic Gradient Descent (SGD) with Apache Spark

@yu-iskw / Latest release: 0.0.2 (2016-03-30) / Apache-2.0 / (0)

  • 1|ml
  • 1|machine learning


Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark

@fabuzaid21 / Latest release: 1.0.1 (2018-05-11) / Apache-2.0 / (1)

  • 1|machine learning


Kuromoji Tokenizer for Spark DataFrame

@yu-iskw / Latest release: 1.2.0 (2016-06-29) / Apache-2.0 / (0)

  • 1|ml
  • 1|machine learning


Rich Spark adds more to Apache Spark

@mashin-io / No release yet / (0)

  • 1|ml
  • 1|library
  • 1|streaming


Ranking algorithms for Spark DataFrame

@yu-iskw / Latest release: 0.0.4 (2016-08-26) / Apache-2.0 / (0)

  • 1|ml
  • 1|machine learning
  • 1|scala


Scalable implementation of artificial neural networks for Spark deep learning

@avulanov / Latest release: 1.0.0 (2016-09-09) / Apache-2.0 / (1)

  • 1|deep learning
  • 1|machine learning


A Scala Implementation of Annoy which searches nearest neighbors given query point. Ann4s also provides DataFrame-based API for Apache Spark.

@mskimm / No release yet / (0)

  • 1|kNN
  • 1|machine learning


A example for Spark ML and StanfordNLP for topic discovery using LDA clustering

@shiv4nsh / No release yet / (0)

  • 1|clustering
  • 1|spark
  • 1|scala


Distributed deep learning with Keras and Apache Spark.

@JoeriHermans / No release yet / (0)

  • 1|machine learning
  • 1|pyspark


Positive-Unlabeled Learning for Apache Spark

@ispras / No release yet / (0)

  • 1|machine learning


A parallel implementation of word2vec based on Spark

@chen-lin / No release yet / (1)

  • 1|machine learning


A parallel implementation of factorization machines based on Spark

@chen-lin / No release yet / (1)

  • 1|factorization machines
  • 1|machine learning


Quick summary: This code implements a spectral (third order tensor decomposition) learning method for learning LDA topic model on Spark.

@FurongHuang / Latest release: 1.0 (2016-12-04) / Apache-2.0 / (1)

  • 1|machine learning


Distributed Linear Programming Solver with Apache Spark

@ehsanmok / No release yet / (1)

  • 1|machine learning
  • 1|optimization
  • 1|convex


a self organizing map for scala and spark

@ShokuninSan / No release yet / (0)

  • 1|machine learning


Twitter Sentiment Analysis - PySpark

@DayneSorvisto / No release yet / (1)

  • 1|twitter
  • 1|machine learning
  • 1|pyspark


A Nearest Neighbor Classifier for High-Speed Big Data Streams with Instance Selection

@sramirez / Latest release: 0.8 (2017-01-27) / Apache-2.0 / (0)

  • 1|streaming
  • 1|machine learning
  • 1|instance selection


Implementation of the Loopy Belief Propagation algorithm for Apache Spark

@HewlettPackard / No release yet / (0)

  • 1|graph
  • 1|machine learning


A simple tool for plotting Spark ML's Decision Trees

@julioasotodv / Latest release: 0.2 (2017-03-25) / MIT / (1)

  • 1|machine learning
  • 1|pyspark


Noise Framework for removing noisy instances with three algorithms: HME-BD, HTE-BD and ENN.

@djgarcia / Latest release: 1.2 (2018-04-18) / Apache-2.0 / (2)

  • 1|noise
  • 1|ensemble
  • 1|machine learning


Deep Learning Pipelines for Apache Spark

@databricks / Latest release: 1.5.0-spark2.4-s_2.11 (2019-01-25) / Apache-2.0 / (3)

  • 1|deep learning
  • 1|machine learning
  • 1|GPU


Deep Learning for MLlib

@JeremyNixon / No release yet / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Microsoft Machine Learning for Apache Spark

@Azure / Latest release: 0.17 (2019-04-23) / MIT / (4)

  • 3|ml
  • 3|Microsoft
  • 3|machine learning


Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.

@ironmussa / Latest release: 1.1.0 (2017-10-25) / Apache-2.0 / (2)

  • 1|machine learning
  • 1|tools
  • 1|pyspark


MRQAR is a new generic parallel framework to discover quantitative association rules.

@djgarcia / Latest release: 1.0 (2017-07-28) / Apache-2.0 / (2)

  • 1|association rules
  • 1|big data
  • 1|machine learning


Affinity Propagation on Spark

@viirya / Latest release: 1.0 (2017-07-29) / MIT / (0)

  • 1|clustering
  • 1|affinity propagation
  • 1|machine learning


Spark implementation of k-medoids clustering algorithm

@tdebatty / Latest release: 0.1.2 (2017-09-24) / MIT / (1)

  • 1|clustering
  • 1|machine learning


Natural Language Processing Library for Apache Spark.

@JohnSnowLabs / Latest release: 3.0.1 (2021-04-02) / Apache-2.0 / (5)

  • 2|NLP
  • 2|machine-learning
  • 2|pyspark


PhysOnline: An Open Source Machine Learning Pipeline for Real-Time Analysis of Streaming Physiological Waveform

@rkamaleswaran / No release yet / (1)

  • 1|machine learning
  • 1|scala
  • 1|real-time


Scalable clustering library

@beckgael / No release yet / (0)

  • 1|clustering
  • 1|machine learning
  • 1|scala


Smart Filtering framework for Big Data

@djgarcia / Latest release: 1.0 (2018-04-09) / Apache-2.0 / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Smart Imputation. k Nearest Neighbor Imputation methods

@JMailloH / Latest release: 1.0 (2018-04-11) / Apache-2.0 / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Topological Data Analysis Package

@ognis1205 / No release yet / (0)

  • 1|ml
  • 1|topological data analysis
  • 1|machine learning


Bagging-RandomMiner ensemble method for anomaly detection

@wuicho-pereyra / Latest release: 1.0 (2018-05-22) / Apache-2.0 / (1)

  • 1|spark
  • 1|big data
  • 1|machine learning


Automated machine learning for structured data

@salesforce / Latest release: 0.7.0 (2020-06-12) / BSD 3-Clause / (5)

  • 2|ml
  • 2|machine-learning
  • 2|scala


Isolation Forest on Spark

@titicaca / Latest release: v2.4.0 (2019-01-02) / Apache-2.0 / (1)

  • 1|ml
  • 1|machine learning


Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark

@titicaca / Latest release: v2.4.0 (2019-01-02) / Apache-2.0 / (1)

  • 1|ml
  • 1|machine learning


Equal Width Discretizer

@djgarcia / Latest release: 1.0 (2018-10-01) / Apache-2.0 / (1)

  • 1|discretization
  • 1|big data
  • 1|machine learning


Similarity encoding of dirty categorical variables (strings)

@rakutentech / No release yet / (1)

  • 1|ml
  • 1|machine learning
  • 1|pyspark


Locality Sensitive Hashing for Apache Spark

@marufaytekin / No release yet / (0)

  • 1|clustering
  • 1|recommendation
  • 1|machine learning


Extensions for Spark ML/MlLib

@chitralverma / Latest release: 0.1 (2018-12-25) / Apache-2.0 / (1)

  • 1|ml
  • 1|machine learning


HS_FkNN: Hybrid Spill Tree Fuzzy k Nearest Neighbors.

@JMailloH / Latest release: 1.0 (2018-12-30) / Apache-2.0 / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Ensemble Estimators for Apache Spark ML

@pierrenodet / Latest release: 0.4.0 (2019-02-16) / Apache-2.0 / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Iterative Ensemble Noise Filter for Big Data

@djgarcia / Latest release: 1.0 (2019-06-10) / Apache-2.0 / (1)

  • 1|machine learning
  • 1|big data
  • 1|data mining


Stream Data Mining Library for Spark Streaming

@huawei-noah / Latest release: 0.0.1 (2019-07-21) / Apache-2.0 / (1)

  • 1|streaming
  • 1|machine learning
  • 1|scala


P-spectrum embedding and sequence relaxation for NLP in Spark

@sirCamp / Latest release: 1.0.0 (2019-08-07) / Apache-2.0 / (0)

  • 1|ml
  • 1|spark
  • 1|machine learning


Complexity metrics for big data problems.

@JMailloH / Latest release: 1.0 (2019-10-17) / Apache-2.0 / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


multi-calss performance matrix aucmu for Apache Spark

@poweihuang / Latest release: 1.0.0 (2019-10-21) / MIT / (1)

  • 1|machine learning
  • 1|pyspark